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DECODER AND METHOD OF DECODING USING PSEUDO TWO PASS 
DECODING AND ONE PASS ENCODING 

FIELD OF THE INVENTION 

The present invention relates to a decoder and a method and system of video data 
decoding, and in particular the decoding of MPEG video bitstreams, that provides memory 
savings in frame buffers. 

BACKGROUND OF THE INV'ENTION 

There is a continuous need for reducing the memory requirements of a video decoder in 
order to reduce costs. However, it is also expected that the subjective quality of the 
decoded video will not deteriorate as a result. Memory reductions (for example in the ratio 
of 10:1 or higher) will also enable the effective embedding of memory components witiiin 
a hardware decoder system component, similar to the embedded dynamic random access 
memory ("embedded DRAM") technology. 

For example, a typical video decoder chip like an MPEG-2 decoder uses a significant 
amount of memory for storing the data in frame buffers to be used for decoding temporally 
linked video frames, video post-processing and for storing on-screen display information 
for feature enhancements. The reduction of memory requirements, especially in relation to 
video decoding and post-processing, has been the subject of much research since it can 
provide significant savings in manufacturing costs. 

Memory reduction with decimation in the spatial domain causes blurring of the image 
while decimation in the frequency domain by applying a fixed bit rate to encode a 
macroblock or block, as suggested in the prior art, causes unpredictable artefacts which are 
especially apparent in fast moving video sequences. 



Another challenge of recompression is the precision of bit rate control. Since the size of 
the physical memory to be used in a system may be fixed, the rate control of the variable- 
length encoding circuit (or entropy encoding) must be accurate, such that the maximum 
memory is utilized without exceeding the allocated memory size. Known methods such as 
virtual-buffer-fullness control may not be used independently since the variations in the 
generation of bits would not be ideal for a fixed and maximally utilised memory buffer. A 
tighter control of accuracy for the virtual-buffer-fullness method results in the degradation 
of picture quality, while better picture quality is associated with a high variation in the bit 
rate. 

StmMARY OF THE INVENTION 

The present invention provides a method of processing video frame data, including the 
steps of: 

(a) receiving a video frame; 

(b) partially decoding the video frame; 

(c) fully decoding the video frame to produce macroblocks; 

(d) determining video data parameters from the partially decoded video frame 
or both the partially and fully decoded video frame; 

(e) encoding the macroblocks based on the determined video data parameters to 
provide a compressed video frame for subsequent display. 

In another aspect, the invention provides a video decoder adapted to perform the above 
method. 

The present invention further provides a video decoder including: 

(a) a bitstream parser for receiving a video frame; 

(b) an embedded decoder for partially decoding the video frame and fiilly 
decoding the video frame to produce macroblocks; 

(c) a data analyzer for determining video data parameters from the partially 
decoded video frame or both the partially and fully decoded video frame; 

(d) an embedded encoder for encoding the macroblocks based on the 



determined video data. 



To achieve a significant compression ratio while rendering acceptable picture quality and 
minimizing implementation complexity, the present invention provides a method of 
applying pseudo two-pass decoding and one-pass encoding of input video data. In the first 
pass of the two-pass decoding process, the input video bitstream is decoded partially to 
extract useful picture statistics for use in the subsequent one pass encoding process. The 
second pass of the two-pass decoding process is performed in parallel with the one-pass 
encoding process, which allows the storage of the decoded picture within a target-reduced 
amount of memory. 

The preferred embodiment re-encodes each anchor frame (which may contain either an I- 
Picture or P-Picture) as an I-Picture to the desired memory compression mtio with 
minimum trade-off in picture quality and system complexity. Also, for cases where the 
display resolution is less than the bitstream resolution, a non-anchor picture (i.e. a B- 
Picture) can be decoded on-the-fly using a two-pass decoding teclinique to significantly 
reduce the overall memory requirements of the system. 

In the case of a standard MPEG decoder, the one-pass encoding process takes the 
techniques of standard intra macroblock encoding with Discrete Cosine Transform 
("DCT"), quantization, the scanning of DCT coefficients in a zig-zag pattern, and 
Variable-Length Coding ("VLC"). To minimise picture degradation and maximise the 
useful picture statistics that can be extracted during the first pass of the two-pass decoding 
process in the present invention for all frames, the one-pass encoding process used in a 
preferred embodiment utilises encoding techniques similar to that described for a standard 
MPEG encoder. 

Embodiments of the invention also relate to a system for performing two-tiered rate control 
that re-compresses video data using present compression techniques. Based on the picture 
statistics derived fi"om the tlrst pass of the two-pass decoding process, the two-tiered rate 
control scheme is applied to the one-pass encoding process to determine the quantizer scale 
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for efficient bit allocation. The determination of a suitable quantizer scale enables 
effective compression while maintaining good picture quality. Artefacts caused by the 
dropping of DCT coefficients during quantization are significantly reduced since 
decimation in the frequency domain is mostly performed within the ideal compression 
5 limit of the system, but rarely within the expansion range of the quantization process. 
Typically, the compression scheme supports a 10:1 memory reduction per frame buffer. 

In addition, the rate control scheme is able to stabilise the bit rate generation of the 
compression scheme and maintain variations in the bit rate generally within 10% of the 
10 average bit rate. This rate control accuracy makes the present system suitable for use in 
systems with fixed memory buffers. Hence a video decoder with an embedded memory 
system may be built in accordance with the principles of the present invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 

Preferred embodiments of the present invention are hereinafter described, by way of 
example only, with reference to the accompanying drawings, wherein: 

FIG. 1 is an illustration of the decoding of an interpolated macroblock in the B- 

Picture; 

20 FIG. 2 is a flow diagram illustrating ±e interaction of various modules employed in 

a compression scheme of an embodiment of the present invention; 

FIG. 3 is an example of an implementation architecture of a reduced memory video 
decoder of an embodiment of the present invention; 

FIG. 4 is a timing diagram illustrating an example of how the present invention 
25 processes a MPEG-2 video bitstream with a frame encoding sequence of {I, P, B, . . . } ; 

FIG. 5 A is a graphical representation of the normal case of maximal overlap of the 
predicted macroblock on the macroblock grid of compressed frame; 

FIG. 5B is a graphical representation of the boundary case of equal overlap on 4 
macroblocks; 

30 FIG. 5C is a graphical representation of the boundary case of equal overlap on 2 of 

the 4 macroblocks; 
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FIG. 6 is a block diagram of a rate control circuit according to an embodiment of 
the invention; 

FIG. 7 is a block diagram of the embedded encoder module shown in Figure 3; 
FIG. 8 is a block diagram of a decoder of another embodiment of the present 
5 invention that supports lower resolution picture or zoom-out picture decoding. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE 
INVENTION 

10 Embodiments of the present invention are applicable to an MPEG-2 video decoder. The 
MPEG-2 specifications (also referred to as "ISO/IEC 13818") achieves significant 
compression by removing temporal redundancy between frames close in time. This is 
done in addition to removing spatial and statistical redundancy within a fi-ame by DCT or 
entropy encoding and lossy compression by quantization. The temporal element implies 

15 that the encoding of a frame is not limited to information within the frame itself but 
information that may span across several pictures. In MPEG-2, the term "picture" refers to 
either a frame or a field. Therefore, a coded representation of a picti,ire may be 
reconstructed into a frame or a field. 

20 Figure 1 illustrates an example of MPEG decoding of a macroblock in a B-Picture 202, 
which in this example requires forward prediction firom an I-Picture 201 and backward 
prediction from a P-Picture 203. The sequence of frames 204 reflects the order in which 
the frames will be displayed, starting from the I-Picture 201. An I-Picture 201 (short for 
''intra picture") is encoded purely with information within the picture, A P-Picture 203 

25 (short for "predicted picture") is encoded with information fi"om an earlier I-Picture 201, or 
from an earlier P-Picture, in addition to information representing the current frame. A B- 
Picture 202 (short for "bidirectional picture") is encoded with information from both or 
either of an I-Picture 201 and P-Picture 203 to be displayed earlier and later than the 
current B-Picture 202. Even with "B-on-the-Fly" decoding, which involves the direct 

30 decoding of B-Picture bitstreams for display without intermediate storage, at least two 
anchor frames (i.e. an I-Picture or P-Picture) are required. Each anchor frame has a 



maximum size of approximately 5 megabits (derived by: 720 horizontal pixels x 576 lines 
X 1.5 bytes per pixel x 8 bits per byte - 5 megabits) in the case of a PAL picture of Dl 
resolution. 

It is desired to re-encode each anchor frame (i.e. an I-Picture or P-Picture) as an I-Picture 
to the desired memory compression ratio with minimum trade-off in picture quality and 
system complexity. 

Figure 2 shows the general operation of the memory reduction scheme in the decoder of a 
preferred embodiment. The MPEG video bitstream is partially decoded by a Bitstream 
Parser circuit 301, which applies variable-length decoding on the video bitstream to obtain 
the quantized DCT coefficients and macroblock information for all macroblocks and 
related picture information in each frame being processed. This is the first pass of the two- 
pass decoding process. The information from the partially decoded MPEG video bitstream 
is passed to a Data Analyzer circuit 302, which further extracts picture statistics and 
macroblock information in relation to each frame being processed. The Bitstream Parser 
301 retrieves macroblock and picture information from the pictures stored in compressed 
frame buffers and the Data Analyzer circuit 302 uses this further macroblock information 
and picture statistics as described later. 

An MPEG-2 Decoder circuit 303 completely decodes the MPEG video bitstream to 
produce macroblocks, which are later re-encoded by an Embedded Encoder 308 for storage 
in the compressed frame buffers and referenced by an address. The stored macroblocks 
may be later retrieved from memory for the motion compensation of predicted 
macroblocks, which may be decoded from later frames, thus forming the final pixel values 
in preparation for display. At the same time, macroblock information is passed to the Data 
Analyzer 302 as it is generated by the MPEG-2 Decoder 303. The Data Analyzer 302 uses 
more information trom decoding (not shown) of the compressed frame buffers and 
computes important macroblock parameters (described below). 



-7- 



A rate control circuit 304 then uses the earlier computed picture statistics and macroblock 
parameters from the Data Analyzer circuit 302 to derive a suitable quantizer scale for the 
Embedded Encoder 308. A Macroblock Bit Allocation circuit 305 first allocates the target 
macroblock bits based on the scaled macroblock complexity. The bit allocation process 
5 allocates the number of bits to be used for encoding the AC coefficients of each 
macroblock using scaled macroblock complexity together with adjustments from a 
proportional and integral error feedback controller. 

The equation governing the bit allocation process is given as follows: 



where sj is the target number of bits for encoding AC coefficients for the macroblock; 
Zi is the estimated complexity for the macroblock; 



sl^^ is the target bits for encoding all AC coefficients from current picture 
statistics; 

is the number of bits for encoding the error of the AC coefficients for the (i-l)^ 
macroblock; 



10 




15 



Xpic is the estimated complexity from current picture statistics; 



20 



1-1 

^ is the accumulation of the number of bits for encoding the error of the AC 



coefficients from the O'** macroblock up to the (i-l)^ macroblock; 

7 is the local proportional error feedback control constant; and 



y is the integral error feedback control constant. 



25 For the present system the macroblock complexity, ^ »is generally defined as: 



where s is the number of bits used for encoding AC coefficients of a macroblock; and 
q is the quantizer scale used for the macroblock. 

The process of scaled macroblock complexity bit allocation is developed on the finding 
that the macroblock complexity is relatively constant over a range of quantizer scales. A 
smaller quantizer scale generates more bits for encoding the AC coefficients of the same 
macroblock while a macroblock with higher complexity requires more bits for encoding its 
AC coefficients for the same quantizer scale. 

The decoder differentiates between the number of bits generated by die AC and DC 
coefficients. As DC coefficients affect the contrast of the picture, it is critical that the 
recompression technique used in the Embedded Encoder 308 uses the same intra-DC 
precision and prevents any further loss of information. This differentiation also benefits 
the second process in the rate control scheme, known as the quantizer scale prediction. 

A Quantizer Scale Prediction circuit 306 then predicts the corresponding quantizer scale to 
be used in encoding the AC coefficients of each macroblock using an inverse complexity 
relation with adjustments made by mismatch control. The equation governing the 
quantizer scale determination is given as follows: 

where q^ is the predicted quantizer scale for the macroblock; 
^' estimated complexity of the macroblock; 

7' 

' is the target number of bits used for encoding AC coefficients for the 
macroblock; 

is the number of bits for encoding the error of the AC coefficients for the 
macroblock; 
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^ is the accumulation of the number of bits for encoding the error of the AC 

coefficients from the O'*^ macroblock up to macroblock; 

a is the local proportional error feedback control constant; and 
j3 is the integral error feedback control constant. 

5 

In addition, normalization may be applied to the macroblock complexity, where the 
normalised macroblock complexity, ^. , is given by: 

10 

where z is the average macroblock complexity of the previous re-encoded picture. 
Normalization is used to achieve better subjective quality and higher rate control accuracy. 
Normalized complexity reduces the differences between macroblock complexities, giving 
a higher rate control accuracy. In addition, high complexity regions are quantized coarser 
15 without subjective compromise, translating into bit savings for sensitive low complexity 
regions that are quantized finer, and thus achieving a better subjective quality. 
Quantization comparison is made between the normalized and non-normalized macroblock 
complexity. 

20 The rate control accuracy is guaranteed by a Feedback Control circuit 307 that implements 
a two-tier proportional integral ("PI") control loop (illustrated in Figure 6). The inner loop 
tightly controls the quantizer scale prediction accuracy at the macroblock level, while the 
outer loop compensates for the offset created by the inner loop and converges the encoding 
bit count to the target picture bit count. 

25 

With the derived quantizer scale from the Rate Control circuit 304 and the encoding 
parameters frohi the Data Analyzer 302, the Embedded Encoder 308 encodes the stream of 
decoded pixel values from the MPEG Decoder 303 via the various video compression 
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techniques in the MPEG-2 specifications, namely DCT, quantization and variable-length 
encoding, to form a video bitstream for storage in the compressed frame buffers. 

Preferably, the Rate Control circuit 304 works in real-time, such that the Feedback Control 
5 circuit 307 receives the actual bit count of the video stream as it is leaving the Embedded 
Encoder 308 and makes any adjustments using the Macroblock Bit Allocation circuit 305 
or Quantizer Scale Prediction circuit 306 for the next macroblock in the pipeline. 

Figure 3 shows an embodiment of the decoder, illustrating an example of an 
0 implementation architecture of B-on-the-Fly MPEG video stream decoding working in 
conjunction with a system providing reduced memory MPEG video stream decoding. To 
illustrate how the system operates, it is assumed that the system will process a MPEG-2 
video bitstream consisting of frames in the display sequence {I, B, P, ...}, with a 
corresponding frame encoding sequence of {I, P, B, ...}. The video bitstream is received 
and stored in a Bitstream Buffer 401. After the required Video Buffering Verifier 
C'VBV") delay, as defined in the MPEG-2 specifications, the bitstream for the entire 
picture is stored in the Bitstream Buffer 401 and ready for decoding. 

The decoding of each picture is performed by a pseudo two-pass decoding and one-pass 
encoding process, where the first phase of the two-pass decoding process, represented by a 
first field time of a frame to be processed, involves extracting macroblock and picture 
information and computing picture statistics (described below) for the current frame. The 
second phase of the two-pass decoding process (represented by a second field time of the 
frame to be processed) involves extracting macroblock information for the current frame 
and also the complete decoding of the current frame. The second phase is performed in 
parallel with the one-pass encoding for the same frame. The encoded video data is then 
stored in compressed frame buffers 407. 

In a first field time of the first firame, t|, a Bitstream Parser 402 decodes the first I-Picture 
sufficiently for a Data Analyzer 403 to perfoma picture statistics computations. In a 
second field time of the first frame, t2, a MPEG Decoder 404 decodes die first I-Picture 
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completely and feeds forward information to the Data Analyzer 403 which computes 
macroblock characteristics. In the same time period, a Rate Control circuit 405 uses the 
computed parameters to determine a suitable quantizer scale. An Embedded Encoder 406 
encodes the original I-Picture as a new I-Picture using the parameters from both the Data 
5 Analyzer 403 and the Rate Control circuit 405 at a macroblock latency relative to the 
MPEG Decoder 404. The final video bitstreams are stored in the Compressed Frame 
Buffers 407 and the address location of each macroblock in the Compressed Frame Buffer 
407 is mapped into a Macroblock Pointer Table. 

10 In a first field time of the second frame, t3, the Bitstream Parser 402 decodes the P-Picture 
in the second firame sufficiently for the Data Analyzer 403 to perform picture statistics 
computations The decoding of the P-Picture is similar to I-Pictures except that the motion 
vectors are used to locate predicted macroblock properties, thereby providing a good 
estimate for the current inter-coded macroblock properties. Reference herein shall be 

15 made to a top-field first video image sequence. It is assumed that the processing of all 
video images begins from line 0, corresponding to the first line of an image. At the same 
time a Display Decoder 408 decodes the top field (or only the even-numbered lines) of the 
I-Picture retrieved from the Compressed Frame Buffers 407 and passes the decoded picture 
to the Standard Display 409 for further processing before it is displayed. In a second field 

20 time of the second frame, t4, the operations of the P-Picture are similar to that for the I- 
Picture (i.e. it is encoded by the Embedded Encoder 406 as a new I-Picture and then stored 
in the Compressed Frame Buffers 407), except that the motion vectors of the P-Picture as 
decoded by the MPEG Decoder 404 are used to locate the corresponding predicted 
macroblocks in the Compressed Frame Buffers 407 using the Macroblock Pointer Table. 

25 The located reference macroblocks are retrieved and decoded by an Embedded Decoder 
410, which operates concurrently with the MPEG Decoder 404 to produce the predicted 
pixel values for the motion compensated picture. In the case of I-Pictures, which has 
concealed motion vectors, a similar procedure like that for P-Pictures is followed. At the 
same time, the Display Decoder 408 decodes the bottom field (or only the odd-numbered 

30 lines) of the I-Picture from the Compressed Frame Buffers 407 and passes the decoded 
picture to a Standard Display 409 for further processing before it is displayed. 
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In a first field time of the third firame, ts, the MPEG Decoder 404 decodes the video 
bitstream of the B-Picture in the Bitstream Buffer 401. However in the case of a B-fi-ame 
picture, only macroblocks of selected motion vectors representative of the top field (i.e. 
5 only the even-numbered lines) are decoded. The motion compensated top field pixels are 
transmitted to the Standard Display 408 for further processing before the top field is 
displayed. A similar operation is performed to the bottom field (i.e. only the odd- 
numbered lines) during a second field time of the third frame, This method of direct 
decoding B-Picture bitstreams for display without requiring intermediate storage is known 
10 as "B-on-the-Fly". 

The present invention only requires two anchor frames at any one time. Where a further 
anchor frame appears in the Group Of Pictures ("GOP") sequence, the new anchor frame is 
encoded as a new I-Picture and stored in the Compressed Frame Buffers 407 by replacing 
15 the earlier of the encoded I-Picture or P-Picture already stored in the Compressed Frame 
Buffers 407. 

Figure 4 is a timing diagram illustrating an example of how the present invention processes 
a MPEG-2 video bitstream with a frame encoding sequence of {I, P, B, ...}. The frame 

20 encoding sequence in this example also reflects the order in which the frames will be 
decoded, ti 90 1 is the first field time of the first frame, and represents the time interval in 
which the I-Picture in the first frame is decoded for picture statistics, ii 902 is the second 
field time of the first frame, and represents the time interval in which the I-Picture in the 
first frame is fully decoded and re-encoded as a new I-Picture before it is stored in the 

25 Compressed Frame Buffers. 

t3 903 is the first field time of the second frame, and represents the time interval in which 
the P-Picture in the second frame is decoded for picture statistics. During the interval t3, 
the top field of the I-Picture from the first frame, lit, is retrieved from the Compressed 
30 Frame Buffers and decoded by the Display Decoder in preparation for display, 904 is 
the second time field of the second frame, and represents the time interval in which the P- 
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Picture in the second frame is fully decoded and re-encoded as a new I-Picture before it is 
stored in the Compressed Frame Buffers. During the interval t4, the bottom field of the I- 
Picture from the first frame, II b, is retrieved from the Compressed Frame Buffers and 
decoded by the Display Decoder in preparation for display. 

5 

t5 905 is the first field time of the third frame, and represents the time interval in which tlie 
top field of the B-Picture in the third frame is decoded by the Embedded Decoder (by 
retrieving from the Compressed Frame Buffers only those macroblocks representing the 
top field of the current B-Picture, B3t, as defined by the motion vectors in the current B- 
10 Frame) and then decoded by the Display Decoder for immediate display, is the second 
time field of the third frame, and performs the same functionality described for 1$ but is 
applied in respect of the bottom field of the B-Picture in the diird frame, B3b. 

During the first field time of an I-Picture or a P-Picture, the Bitstream Parser 402 variable- 
15 length decodes the video bitstreams after removing the numerous headers (including those 
headers diat define a sequence, GOP, picture, slice, or macroblock) and extracts the 
following picture parameters for storage and macroblock parameters for fiirther processing 
in Data Analyzer 403. 

20 Picture parameters: 

• qjscalejype, the type of quantization table used (linear or non linear); 

• Intra Quantizer Matrix, the two dimensional 8x8 quantization table used for 

intra-coding; 

• intra_pC precision, the number of bits used for coding DC coefficients; 
25 • alternate jcan, the type of zig-zag scan to perform; and 

• intra_ylc Jormat, the type of variable length coding table used for intra- 

coding. 

Macroblock parameters: 
30 • 5* , the bit count of AC coefficients; 
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• , the bit count of DC coefficients; 

• qf , the quantizer scale; 

• {{'"^A 1 ' set of K full scale motion vectors decoded from the 
associated motion vector information; and 

• mbjntra, the boolean representation of intra-coded macroblocks. 

The parameters qjcalejype, intra JDC j)recision, alternate _scan, intra^vlc Jbrmat and 
mbjntra are defined in the MPEG-2 Specifications. 

Preferably, the macroblock parameters are processed in the Data Analyzer 403 as 
information is extracted from the Bitstream Parser 402 during the first pass decoding and 
the MPEG Decoder 404 during the second pass decoding. The Bitstream Parser 402 or the 
MPEG Decoder 404 might not give the exact list of parameters desired since some of the 
parameters have to be computed outside the MPEG Decoder 404. For example, the bit 
count of AC coefficients can be computed from the difference between the bitstream 
pointer from the first AC coefficient decoding to, and not inclusive of, the end of block 
code. In addition, some of the macroblock parameters are computed from the extracted 
parameters and some are accumulated to form picture parameters. 

If the macroblock is intra-coded (i.e. if mbjntra = 1), the estimated macroblock 
complexity, and the estimated bit count of DC coefficient, J., will be defined as 
follows: 

If the macroblock is inter-coded or if the prediction error is coded (i.e. U mbjntra = 0), the 
estimated macroblock complexity, and the estimated DC coefficients bit count, d^, 
will be defined as follows: 
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d. = d' . 

where q- and d^arc the bit count of AC coefficients, the quantizer scale and the bit 
5 count of DC coefficients respectively, as derived from the Compressed Frame Buffer 407. 
Statistical data are gathered from the Compressed Frame Buffers 407 rather than from tlie 
Bitstream Buffer 401 for inter-coded macroblocks, since the predicted macroblocks 
provide a closer image description to the original macroblock than the prediction error. 

The set of motion vectors, {{/wVj^ f. , is used to locate the predicted macroblocks using 
10 the address references in the Macroblock Pointer Table. 

For the equations in this specification, the b superscript represents information derived 
from the Bitstream Buffer 401 while the c superscript represents information from the 
Compressed Frame Buffers 407. Thus, for example, can be represented by d- or d- , 

15 depending on whether the macroblocks to which d. relates are intra-coded or not (i.e. 
whether mbjntra = 1 for those macroblocks). 

Each macroblock is variable-length encoded and has a pointer reference to the start of the 
macroblock for easy reference in motion compensation. Reference herein is being made to 

20 a PAL picture (720 pixels x 576 lines) of Dl resolution. A macroblock defines a two 
dimensional region consisting of a 16 x 16 pixel array in the video image. Each picture 
thus requires a Macroblock Pointer Table with a maximum of 1620 entries(where 45 
macroblocks per line * 36 macroblocks per column = 1620 macroblock entries), A 
possible implementation with memory saving is to have a hierarchical pointer system 

25 implemented in the Macroblock Pointer Table. For example, the picture is divided into 
video segments, such that each video segment consists of five consecutive macroblocks. 
Each picture should have 324 full segment pointers (1620 macroblocks per picture / 5 
macroblocks per video segment = 324 segments) and four incremental macroblock pointers 
per segment, where each incremental macroblock pointer points to a macroblock relative to 
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the previous macroblock within that segment. Hence, for a 4:2:0 chroma sampling format, 
the largest macroblock size would be 3072 bits (8x8 pixels per block * 6 blocks per 
macroblock * 8 bits per pixel = 3072 bits per macroblock) and the largest segment size 
would be 15,360 bits (3072 bits per macroblock * 5 macroblocks per video segment = 
5 15,360 bits) and the largest picture size would be 4,976,640 bits (3072 bits per macroblock 
* 1620 macroblocks per picture = 4,976,640 bits). Assuming that the compression does 
not expand the original pixel data, a 12 and 23 bit precision is defined for an incremental 
macroblock pointer and a full segment pointer respectively. As a result, a minimum of 
23,004 bits (324 segments per picture * (23 bits per full segment pointer + 12 bits per 

10 incremental segment pointer * 4 incremental segmental pointers per segment) = 23,004 
bits) is required to implement the Macroblock Pointer Table. Preferably, an 8x8 block 
pointer system that provides a finer resolution is not used since it requires an additional 9 
bits per block (capped by the amount of uncompressed data per block) and amounts up to 
an additional 72,900 bits (9 bits per block * 5 incremental block pointers per macroblock * 

1 5 1 620 macroblocks per picture = 72,900 bits) of memory. 

By first identifying the segment, k, within which the macroblock, /w, is located and each 
macroblock has a positional offset, aj, it is possible to calculate the address of a predicted 
macroblock as follows: 

20 

/-I 

mb _ address, „ = segment _ address + ^ /rzZ? _ address _ inc^ 

where mb _ address^ is the absolute macroblock address of macroblock m\ 
segment jiddressk is the full segment address of segment kr, and 
23 mbjiddressjnci is the incremental macroblock address of a macroblock with a 

position offset / within a segment k, and where / is an integer ranging from 0 to 
n-l and n represents the number of macroblocks within a segment. 

In contrast to the pixel resolution determined from the motion compensation scheme in a 
30 normal MPEG Decoder, macroblock resolution is used in the present decoder. Inter-coded 
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macroblock properties are derived using motion vectors on the macroblock grid or a two 
dimensional boundary. 

Figures 4A, 4B and 4C illustrate examples of a predicted macroblock on the macroblock 
5 grid. As shown in Figure 5A, the predicted macroblock has a maximal overlap of pixels in 
macroblock D. The estimated macroblock complexity, Zn estimated bit count of 

intra-DC coefficients for the inter-coded macroblock, , are defined as: 



where 5^ , , correspond to the bit count of AC coefficients, the quantizer scale and 
the bit count of DC coefficients respectively, as derived from a simple decoding of 
macroblock D from the Compressed Frame Buffers 407. As showTi in Figure 58, if there 
15 is equal overlap on all macroblocks, any one of the four macroblocks will be used (for 
example macroblock D). As shown in Figure 5C, if there is overlap on only 2 
macroblocks, either one of the 2 macroblocks is used (for example macroblock B). A 
similar logic is applied at picture boundary conditions. 

20 However in the case of multiple motion vectors, the average complexity, J^, and the 
average bit count of DC coefficients, d^, are taken over all motion vectors, and is 
represented by: 



10 



d, = d'^ 




where Zk ^1 respectively correspond to the macroblock complexity and the bit count 
of DC coefficients derived, as defined above for the motion vector, from the set of 
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motion vectors {mv},^ and where IC is the maximum number of motion vectors in the 
bitstream of a P-Picture according to the picture and motion prediction type, as tabulated in 
Table 1 below: 



Picture Type 


Prediction Type 


K 


Frame 


Frame 


1 




Field 


2 




Dual Prime 


4 


Field 


Field 


1 




16x8 


2 




Dxial Prime 


2 



5 

TABLE I 

A similar procedure is derived for skipped macroblocks in the P-Picture by setting K = 1 
and the corresponding motion vector = 0. 

10 

Referring again to Figure 3, the derived macroblock properties from the Compressed 
Frame Buffers 407 and the extracted macroblock properties from the Bitstream Parser 402 
are accumulated at a macroblock level in Data Analyzer 403. These macroblock properties 
are as follows: 

15 

i 

X- = ^ Xj > accumulated complexity; 

i 

S- =2^5i\ the accumulated bit count of AC coefficients from the Bitstream 

/=o 

Parser; 

=^<^j , the accumulated bit count of DC coefficients from the Bitstream 

20 Parser; and 
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Z)f = , the accumulated bit count of DC coefficients from the Compressed 

7=0 

Frame Buffers. 



When the last macroblock M-I of the picture has been reached, the picture complexity, 
Zpicy picture DC coefficients bit count from the Bitstream Parser 402, Z)*-^, and the 

picture DC coefficients bit count from the Compressed Frame Buffers 407, D^^ , are 

obtained as the result of accumulation. Other picture statistics are computed at the picture 
level by the equations below. 

The bit count of a picture's DC coefficients is calculated as follows: 



d pic ^ pic * ^pic^type ^ pic 



where ^p^^^^yp^ is the estimated compression factor of the DC coefficients bit count in the 
previous picture of the same type as described below. 

The target bit count of a picture's AC coefficients is calculated as follows: 

s^. -d-d , 

ptc ^ pic T ^ ptC 

where <j> is the overhead size (in bits) of the compression including the Macroblock 
Pointer Table, the quantizer scale code (a 5 bit parameter from the MPEG-2 
Specification) and the dctjype (a 1 bit parameter from the MPEG-2 Specification) for all 
macroblocks in the picture, and where is the target picture size. The quantizer scale is 

calculated from the quantizer jscale_code and q_scalejype. The dctjype represents 
either performing the discrete cosine transform in frame format or field format. 
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During the second field time of an I-Picture or P-Picture, the MPEG Decoder 404 decodes 
the MPEG video bitstream. During the full decoding process, the macroblock parameter 
dctjype] and all the above mentioned macroblock parameters (other than tlie bit count of 
DC coefiBcients from the Bitstream Parser 402, rf*) are extracted, dctjype^^ indicates 
5 whether frame IDCT or field IDCT was performed during the decoding process and is 
useful for encoding (such as in the Embedded Encoder 406). The extraction of the above 
mentioned macroblock parameters (other than the bit count of DC coefficients and the dct- 
type) during the first pass encoding for each macroblock is preferably repeated in the full 
decoding process (i.e. the second decoding process) so as to save buffer space for storing 

10 macroblock values, although it is not necessary for it to be repeated. Macroblock 
complexity may be computed again as described above for the Rate Control circuit 405. 
Additional parameters, like dctjype and minjjscale, are preferably derived in the full 
decoding process. However, these two parameters may be derived in the Bitstream Parser 
402 during the first pass decoding and stored for use during the full decoding process. 

15 min_qscale represents the minimum quantizer scale used and is relevant for the Mismatch 
Control circuit 605 (described later in the text) for controlling the minimum quantizer scale 
to be used for embedded encoding. If the quantizer scale used is lower than the 
minjjscale, then there is no value added because quantization with a parameter smaller 
than the original encoded stream in the Bitstream Buffer 401 does not produce a better 

20 image quality. The bits saved in the process can be used for storing other macroblocks. 

For intra-coded macroblocks, 

dct _ type^ = dct _ typef 
25 min_qscale^ = 

For inter-coded and skipped macroblocks, 



30 



dct _type^ =dct_type^^ 



min^qscale^ =m\n_qscale^ 

The dctjype"; and minjjscale^^ parameters correspond to the dct type and minimum 
quantizer scale parameters respectively, and are derived from the Compressed Frame 
Buffers 407 using the set of motion vectors {{'wVi }j^~o j. The c superscript represents 
information from the Compressed Frame Buffer 407. 

For one motion vector {mv^y. (with reference to Figure 5 A), dct _type^ is the dctjype of 
macroblock D and min _qscale^ is the minimLmi quantizer scale of the 4 or less 
macroblocks of interest. Both are derived as follows: 

dct _ type^ = dctjype 
min _ qscale^ = min {gr * )^^^ 

Similar derivations can be done where there is an equal overlap (as shown in Figures 4B 
and 4C) and for picture boundary conditions. 



In the case of K multiple motion vectors, 



dct^type^ = 



default _ dct _ type 
major _ dct _ type 



Y,dct_typel=KI2 



Otherwise 



min _qscale^ = min {minjjscale\ } fj' 



where dctjype I refers to the DCT type derived for the kf^ motion vector, which is derived 

in a similar way to the case for the one motion vector above; default jictjype refers to a 
fixed dctjype value assigned for cases where equal numbers of motion vectors have the 
same dctjype; major jictjype refers to the conformance to the majority of the dctjype 
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that is derived by the set of motion vectors; and ^^-^ _qscale^ is the minimum minjqscale 
derived from the same set of motion vectors; and minjjscalel refers to the minimum 
quantizer scale derived from the k!^ motion vector, which is derived in a similar way to the 
case for the once motion vector above. 

In addition, referring to Figure 3, Data Analyzer 403 updates two parameters (i.e. the 
default jictjype and DC compression factor) at the picture level in the first or second field 
time for use in the subsequent pictures. 



1 0 default jictjype = 



N-\ 



1 2 ^^'^-O^P^i >N/2 
0 otherwise 



where is the number of macroblocks in a picture and default jictjype is the majority of 
dctjype used for the macroblocks in the picture. 

15 The DC compression factor is calculated as follows: 



where picjype is an I-Picture or P-Picture, D^^ is the bit count of picture DC coefficients 
20 from the Bitstream Parser 402, and D^^* is the bit count of picture DC coefficients from 

the re-encoded macroblocks that were intra-coded in the Bitstream Buffer 401. This 
compression factor is calculated for different picture types and may be used in die 
subsequent pictures of the same type. Typically, /t^/^ can be initalized to 1 . 

25 With the picture and macroblock statistical information, tlie Rate Control circuit 405 
derives a suitable quantizer scale for encoding. Figure 6 shows a functional block diagram 
of the rate control scheme. A Macroblock Bit Allocation circuit 601 (shown as 305 in 
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I Figure 2) allocates the number of bits to be used for encoding the AC coefficients of the 

macroblock according to scaled macroblock complexity, which is defined as follows; 

^ pic 

5 where s] is the target number of bits for encoding AC coefficients for the macroblock; 
Xi is the estimated complexity for the macroblock; 

i 

' Xpic is the estimated complexity from current picture statistics; 

is the target bits for encoding all AC coefficients from current picture 
statistics; and 

i 10 £p^^ = 7e,_i is the proportional integral control adjustment for the picture 

i level. 

! 

" i ■ 
i 

The Qscale Prediction circuit 602 (shown as 306 in figure 2) then predicts the quantizer 
scale with the following equation: 

i 

where q- is the predicted quantizer scale for the macroblock; 
estimated complexity of the macroblock; 

i T 

20 ^' is the target number of bits used for encoding AC coefficients for the 

\ 

i macroblock; £ind 

■ 1 ■ 

j /-I . 

i =ae,_, H-y^^e^ is the proportional integral control adjustment for the 

i - jfc=o 

I 

I macroblock level. 



i 

i 
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To stabilize bit rate generation and improve rate control accuracy, a two-tier control closed 
loop is implemented. The error, =sj -^f , defined as the difference between the target 

and re-encoded AC coefficient bit count, is fed back to outer and inner Proportional 
Integral (PI) control circuits 603 and 604 for the macroblock and picture level, 
5 respectively. The inner (macroblock) PI controller 604 compensates for the inaccuracies 
of the Qscale Prediction circuit 602 at the macroblock level and adds an error adjustment, 

calculated by + y^^^t , to the predicted quantizer scale, q^^ where a = -0.0008 and 

P = -0.0005 are example constant values. These values are exemplary values only. The 
bounds of these values are related to the convergence of the control loop. 

10 

On the other hand, the outer (picture) PI controller 603 is concerned with the general 
stability of the bit rate generation and ensures the convergence of the target picture bit 

count. It adds an error adjustment, calculated by rje^^^+y^^k ^ ^o the target AC 

Jt=0 

coefficient bit count before the scaled macroblock bit allocation takes place. In this case, 
15 example values can be // = 0 and ;^ = 1.5 since the impact of local error feedback is less 
than integral error feedback for stability purposes. These values are exemplary values 
only. The bounds of these values are related to the convergence of the control loop. 

Mismatch control is performed by a Mismatch Control circuit 605 and is performed after 
20 the quantizer scale prediction and error adjustments. The objective is to match the discrete 
set of quantizer scale values defined by qjcalejype and quantizer jcalej:ode in the 
MPEG specifications. First the Mismatch Control circuit 605 performs saturation on the 
incoming quantizer scale, q^, to the range of 2 to 62 for linear quantization (i.e. where 
qjscalejype = 0), and to the range of 1 to 112 for non-linear quantization (i.e. where 
25 qjscalejype = I). Then the Mismatch Control circuit 605 rounds the saturated quantizer 
scale to a discrete quantizer scale value according to the linear and non-linear quantization 
table. 
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In addition, the Mismatch Control circuit 605 compares the discrete quantizer scale to the 
minjjscale variable and forces the quantizer scale to equate with the minjjscale variable 
if the quantizer scale is larger than the value of minjjscale. This is to ensure that the 
Embedded Encoder (shown as item 406 in Figure 3) will encode with a quantizer scale no 
5 finer or smaller than its original quantizer scale as picture quality does not improve and 
encoding bits can be saved. 

However for I-Pictures, if the estimated bit count of picture AC coefficients, calculated by 
^pic " 2*^y » ^^^^ target bit count of picture AC coefficients, , the original 

10 quantizer scale, qj , is used instead. 

In an alternative embodiment, a slight adaptation involving (or the normalized 

macroblock complexity) can be used instead of the macroblock complexity, The 

reason is that high complexity macrcblocks have image details that are not compromised 
15 subjectively by using a coarser quantization step size, whereas low complexity 
macroblocks with smooth and homogenous areas that are more sensitive to the 
quantization step size and thus requires further quantization. 

Normalized complexity is defined as follows: 

20 

where / is the average macroblock complexity of the previous re-encoded picture. 
Normalized complexity reduces the variation of macroblock complexity, raises the 
25 quantizer scale for higher complexity macroblocks- and lowers the quantizer scale for low 
complexity macroblocks. The replacement of macroblock complexity, x.y by the 
normalised complexity^ Xi^ above steps concerning rate control achieves the 

advantage of having better subjective quality and higher rate control accuracy. 
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The Embedded Encoder (shown as item 406 in Figure 3) re-encodes all anchor frames 
(namely I-frames and P-frames) and all the components of the Embedded Encoder are 
shown in Figure 7, A Discrete Cosine Transform (DCT) circuit 701 perfomis a two 
5 dimensional 8x8 DCT on the pixel output of the decoded video bitstream from the MPEG 
Decoder (shown as item 404 in Figure 3), For a frame picture, a frame DCT is performed 
if the dct jype is determined to be 0 in the Data Analyzer (shown as item 403 in Figure 3), 
otherwise a field DCT is performed. The resultant DCT coefficients are quantized by a 
Quantizer (Q) circuit 702 using the 8x8 intra-quantizer matrix from the Data Analyzer 403 

10 and the derived quantizer scale from the Rate Control circuit (shown as item 405 in Figure 
3). The 8x8 quantized coefficients are then re-arranged in a zig-zag manner by a zig-zag 
(ZZ) circuit 703 according to the predetemiined alternate_scan format parameter from the 
Data Analyzer 403. A Variable-Length Coder (VLC) circuit 704 subsequently variable- 
length encodes the one dimensional data in the chosen intra jylcjbrmat determined from 

15 the Data Analyzer 403 using run-iength encoding and the Huffman table defined in the 
MPEG-2 specifications and also differential encoding of the luminance DC coefficients 
within the macroblocks. At the same time, the number of bits generated by the VLC 
circuit 704 is tracked and fed back to the Rate Control circuit 405. 

20 The VLC circuit 704 also ensures minimal expansion of video data on a 8x8 block basis by 
limiting the maximum number of bits for an 8x8 block to 682 bits, or by dropping the last 
few coefficients if the limit is exceeded. The reason for this is that the macroblock pointer 
has a fixed precision of 12 bits and any expansion of data is inefficient on memory savings. 
Furthermore, limiting the bit count on the expanded data rate does not have an adverse 

25 effect on the picture quality. The entire encoding process is similar to the MPEG encoder 
except for the absence of the motion estimator, the decoding loop, motion vectors and 
MPEG conforming bitstreams. The components in Embedded Encoder 406 can be built 
from standard components in an MPEG Encoder. 

30 Referring again to Figure 3, as the bitstreams from the Embedded Encoder 406 are 
generated, they are stored into the Compressed Frame Buffers 407 and the Macroblock 
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Pointer Table is updated with the new starting address of each macroblock. In the 
decoding of B-Pictures. anchors frames are required to provide the forward and backward 
prediction, giving rise to the need of at least two compressed anchor frames (I-frames or P- 
frames stored as I-Pictures) in the Compressed Frame Buffers 407, For example, with a 
5 10:1 compression scheme, the capacity of die Compressed Frame Buffer 407 will be 
approximately equal to 1 megabits (calculated by 2 frames * 0.1 being the compression 
factor * 4,976,640 bits per frame ~ 1 megabits). 

Memory space may be shared between the Video Bitstream Buffer 401, the Compressed 
10 Frame Buffers 407, the Still Picture Buffer (not shovm) and the On-screen Display 
Graphics Buffer (not shown). The Still Picture Buffer stores a compressed still picture 
from the bitstream that may be decoded as a background picture during run time. The On- 
screen Display Graphics Buffer stores graphics including texts and logos that are overlaid 
on screen for special features, for example the channel menu. The size of the Video 
15 Bitstream Buffer 401 varies with the video bitrate, vnih an upper limit determined by the 
maximum bit rate of the video bitstream. The size of the Compressed Frame Buffers may 
therefore be changed for every video bitstream or for the decoding of a particular bitstream 
according to application needs and memory availability. The number of bits for a picture 
may also be dynamically allocated for maximal picture quality. 

20 

In an alternative embodiment, the present system may be extended to also perform lower 
resolution picture decoding. Figure 8 shows a modified architecture of the detector to 
support lower resolution picture or zoom-out picture decoding. The architecture in Figure 
8 involves the inclusion of a Decimation Filter 811 and an Interpolation Filter 812. All 

25 other components in Figure 8 have the same functionality as the correspondingly 
numbered components described in Figure 3. The Decimation Filter Circuit 811 performs 
a horizontal spatial decimation of the display pixels to the required resolution using a 
digital decimation filter, for example a 7 tap tilter with coefficients [-29, 0, 88, 138, 88, 0, 
-29] for a 2:1 decimation, before it is encoded by Embedded Encoder 406. The 

30 horizontally down-scaled version of the picture is stored as variable-length encoded 
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bitstreams in the Compressed Frame Buffers 407 and is decoded when necessary for 
display by the Display Decoder 408. 

Aji Interpolation Filter 812 is needed to perform a horizontal spatial interpolation to 
5 achieve full Dl resolution motion compensation using a digital interpolation filter, for 
example a 7 tap filter with coefficients [-12, 0, 140, 256, 140, 0, -12] for a 1:2 
interpolation. The two filters can be designed jointly and filters for luminance and 
chrominance components can be customized to maximize picture quality. These filters are 
standard components of a video pre-processor subsystem in video related application 
10 systems. 

Besides using the above filters, the Data Analyzer 403 may be adapted to support zoom- 
out modes where the Dl horizontal resolution is an integer multiple, 9, of a lower 
horizontal resolution picture. The adaptation includes additional analysis being performed 
15 once for every 0 macroblocks decoded to compute corresponding parameters of the to-bo- 
displayed macroblock. Macroblock complexity and quantizer scale are averaged for 9 
macroblocks, the minimum min qscale is selected among 9 macroblocks and the majority 
dctjype for 9 macroblocks is conformed to. 

20 The bit counts of DC and AC coefficients (for I-Pictures), and 1^ respectively, are 
accumulated for every decoded macroblock as described in the earlier text. The bit count 
of the respective DC and AC coefficient in a picture (for I-Pictures) are calculated based 
on the to-be-displayed macroblock, as follows: 



25 





It is obvious to those skilled in the art that special considerations may be made for the 
picture boundary in the case where tlie width or height of the decimated picture is not an 
integer multiple of macroblocks. 

To reduce speed complexity issues regarding the B-on-the-Fly decoding of B-Pictures for 
low resolution display, two extra picture buffers may be included as part of the system to 
temporarily store B-Pictures. As such, B-pictures may be compressed similar to the anchor 
frame pictures as described above. The size of the Compressed Frame Buffers 407 is not 
compromised as lower resolution pictures produce acceptable picture quality at a smaller 
target picture bit count. 

Further simplification may be done to reduce decoding from a two-pass to a single-pass 
decoding process for B-Pictures. As such, the computation of the Data Analyzer 403 
during the first field of decoding is eliminated and the macroblock bit allocation scheme in 
the Rate Control Circuit 405 is adapted to allocate a constant or averaged target bit count 
of AC coefficients, defined by: 

' N 

where N is the number of macroblocks in the low resolution picture, and the bit count of 
picture DC coefficients (and hence the target bit count of picture AC coefficients) may be 
estimated from the previous picture of the same type. 

The term "circuit" as used herein with reference to functional components is intended to 
include any applicable hardware componentry which can accomplish the appropriate 
function and may include processor chips and ASICs as well as basic electronic logic 
components. The "circuits" may also be implemented as modules executed in software or 
a combination of soft ware and hardware 



-30- 



All of the above U.S. patents, U.S. patent application publications, U.S. patent 
applications, foreign patents, foreign patent applications and non-patent publications referred to 
in this specification and/or listed in the Application Data Sheet, are incorporated herein by 
reference, in their entirety. 

From the foregoing it will be appreciated that, although specific embodiments of 
the invention have been described herein for purposes of illustration, various modifications may 
be made without deviating fi-om the spirit and scope of the invention. Accordingly, the invention 
is not limited except as by the appended claims. 



